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Abstract 

The authors assessed the quality of single-case design (SCD) studies that assess the impact of interventions on outcomes 
for individuals who are deaf or hard-of-hearing (DHH). More specifically, the What Works Clearinghouse (WWC) standards 
for SCD research were used to assess design quality and strength of evidence of peer-reviewed studies available in the peer- 
reviewed, published literature. The analysis yielded four studies that met the WWC standards for design quality, of which 
two demonstrated moderate to strong evidence for efficacy of the studied intervention. Results of this review are discussed 
in light of the benefits and challenges to applying the WWC design standards to research with DHH individuals and other 
diverse, low-incidence populations. 


There are many challenges in establishing a robust foundation of 
interventions and evidence-based practices for individuals who 
are deaf or hard-of-hearing (DHH) (Luckner, 2006; Luckner, Sebald, 
Cooney, Young, & Muir, 2005). 1 Some of these challenges in the 
field stem from the demographic reality of assessing diverse DHH 
student populations across a broad range of educational settings 
that vary in the extent to which interventions are tailored to fit 
the needs of DHH students. Other challenges are a result of few 
resources available to conduct said studies, particularly needed 
replications of prior findings. Replications across research teams 
is also challenging because, in many cases, there are divergent 
perspectives on the theoretical underpinnings of the field. From 
a measurement perspective, the low-incidence and heteroge- 
neity of the DHH population restricts the range of options for 
experimental research designs with sufficient numbers of stu- 
dents in control and treatment conditions (with sufficient covari- 
ates), thus limiting the breadth and scope of many educational 
research methodologies that can be feasibly implemented and 


lead to strong causal inferences. This, in turn, reduces the depth 
of the research literature from which causal inferences about 
effective interventions for DHH students might be made. 

Evidence-based practices in many areas in education 
are informed by results from large-scale experimental stud- 
ies such as randomized control trials (Thompson, Diamond, 
McWilliam, P. Snyder, & S. Snyder, 2005). Alternative methods to 
traditional group-based experimental designs include descrip- 
tive approaches, such as case studies, and other empirical 
approaches, such as single-case designs (SCDs) (Horner, Carr, 
Halle, Odom, & Wolery, 2005). SCDs include repeated observa- 
tions of participants across multiple phases of the study, from 
before the intervention through follow-up, using a participant’s 
individual changes in behaviors as the point of reference for 
interpreting degree and cause of change. In recent years there 
has been increased recognition of the importance of the SCD for 
estimating the effectiveness of interventions for low-incidence 
populations (Kratochwill et al., 2013; Shadish & Sullivan, 2011), 
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including individuals who are DHH (e.g., Antia & Kreimeyer, 
2003; Beal-Alvarez, Lederberg, & Easterbrooks, 2012). 

In the past, a lack of explicit protocol related to experimental 
design and data interpretation has been an inherent challenge 
with evaluating the quality of SCD research and the strength 
of resultant findings. More specifically, the use of visual analy- 
sis as the primary means to interpreting interventions effects 
is a major criticism of SCDs, (Kazdin, 2011), because visually 
analyzed results are vulnerable to researcher bias, subjectivity, 
and inconsistency. Regardless of these drawbacks, the potential 
for SCDs to contribute to the advancement of research among 
low-incidence groups has been widely acknowledged. Research 
utilizing SCDs is a methodologically sound and functionally 
appropriate way to construct experimental research studies with 
individuals who are DHH. Most notably, SCDs are often compat- 
ible with research on topics in deaf studies and deaf education, 
such as language development, literacy, and communication 
strategies. These studies often demonstrate observable out- 
comes at the individual level, allowing for analysis of a specific 
individual's response to treatment and a meaningful interpre- 
tation of treatment outcomes that is responsive to population 
diversity (Bergeron, Lederberg, Easterbrooks, Miller, & Connor, 
2009; Long, 1996; Odom et al., 2005; Schirmer & McGough, 2005). 

Although there have been a number of SCD studies with 
DHH participants, a systematic review of the scope, quality, or 
methodological rigor of these publications has yet to be con- 
ducted. Given the paucity of intervention research that includes 
DHH individuals, and the compatibility of SCD approaches with 
measuring intervention outcomes among low-incidence groups 
(Horner et al., 2005), an exploration of the quality and scope of 
SCD studies with participants who are DHH is warranted. The 
purpose of this article is to evaluate the alignment of research 
using SCD across DHH literature with the guidelines provided 
by the What Works Clearinghouse (WWC), a project sponsored 
by the U.S. Department of Education to evaluate the strength 
of evidence for interventions studied within education research 
(http://ies.ed.gov/ncee/wwc/). (Specific WWC guidelines for 
SCDs are discussed in the Method section.) In this article we 
first provide a more detailed discussion of the implications of 
research design decisions in studies that include DHH partici- 
pants, specifically looking at group-based designs and SCDs. 
We offer this discussion of group-based designs as a point of 
comparison because of the prevalence of this design approach 
in intervention research. We then describe the WWC criteria by 
which we evaluated each included study and the types of indi- 
cators that varied between SCDs with DHH participants. This 
article concludes with a discussion on the state of SCD research 
within the field (in accordance with the WWC standards) and 
future directions for research. 

Group-Based Versus SCDs 

In this section, we offer a comparison of a widely used method 
of demonstrating causal relationships within intervention 
research (i.e., group-based experimental research designs) and 
SCDs. We provide this comparison in order to illustrate the 
appropriateness of SCDs to demonstrate participant outcomes 
on an individual level, as group-based designs provide a suitable 
contrast to illustrate this concept. Please note that the following 
discussion does not encompass the full range of methodologies 
that can be used to draw causal inferences; for a comprehen- 
sive review of quantitative research designs that can be used to 
demonstrate causal relationships, please refer to Shadish, Cook, 
and Campbell (2002). 


Group-Based Designs 

In order to establish that a treatment or practice is effective, 
researchers need to demonstrate that a causal relationship 
exists between an intervention and student outcome variables 
(Tankersley, Harjusola-Webb, & Landrum, 2013). A causal rela- 
tionship is established when participant outcomes are dem- 
onstrated to be the product of an intervention while external 
factors are simultaneously ruled out as the source of change 
(Barger-Anderson, Domaracki, Kearney-Vakulick, & Kubina, 
2004). Although group-based research, specifically RCTs, has 
been historically regarded as the “gold standard” for determin- 
ing causal relationships between intervention and outcome var- 
iables (Borckardt et al., 2008; Parker, Davidson, & Banda, 2007), 
there are significant limitations to using group designs with 
low-incidence populations. One limitation is the challenge of 
having enough DHH students in treatment and control condi- 
tions to have sufficient power to detect an effect of the inter- 
vention, if it exists. For example, there may not be enough DHH 
participants with similar demographic backgrounds to be able 
to “match” students in a matched-pair random assignment. As a 
further example, in order to gain the needed samples, research- 
ers often face significant nesting issues (e.g., all students who 
are DHH in a district receive services in the same classroom 
or school location) or fidelity issues (e.g., treatments must be 
administered across multiple sites, introducing variance from 
different administrators, interpreters, and other providers of the 
intervention (Luckner, 2006; Luckner et al., 2005). 

Even when sufficient numbers of DHH participants can be 
assigned to treatment and control conditions of an experimental 
study, there are still significant obstacles to drawing appropriate 
inferences of nuances of the DHH population that are challeng- 
ing to account for in a group-based research design. Individuals 
who are deaf are a highly heterogeneous population with 
characteristics that often intersect between language, disabil- 
ity, communication, and cultural identity (Baker-Shenk & Kyle, 
1990; De Clerck, 2010; Najarian, 2008; Wheeler-Scruggs, 2003). For 
research with DHH participants, it is particularly important to 
consider possible confounds related to constructs such as lan- 
guage and communication modalities or disability status (e.g., 
learning disability), as these could have a considerable effect on 
intervention outcomes across a highly heterogeneous group of 
individuals (Antia & Kreimeyer, 2003; Bullis & Anderson, 1986). 
Even when sufficient numbers of participants might be available 
for a group design, researchers may still opt for a SCD approach 
because group designs are limited in the extent to which a vari- 
ety of differing demographic dimensions of participants can be 
represented the same time as investigating the specific effects 
of the intervention. 

Because of the heterogeneity of DHH population, the degree 
of success for interventions and resultant practices are often 
dependent upon the match between the strategy or practice and 
the individual student characteristics. Group-based data, par- 
ticularly those presented without an investigation of interaction 
effects, often mask important moderators related to participant 
demographic variables. As a result, researchers using the group 
design approach will have a limited understanding of specific 
treatment needs and intervention responsiveness between DHH 
subgroups. For example, individuals who are native deaf and 
those who are late-deafened are likely to respond differently to 
a training intervention for increasing functional communication 
skills, given the varying degrees of language exposure and diver- 
gent communication styles between these groups. This variabil- 
ity in both etiology and characteristics have implications for the 
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generalizability of intervention research across DHH subpopula- 
tions (e.g., individuals who are deaf with additional disabilities), 
in that aggregated outcome data are not ideal for interpreting 
the differences in how individuals may differentially respond to 
the intervention or strategy. 

Single-Case Designs 

Due to the focus on individual level participant outcomes 
(rather than aggregated group outcomes), SCDs allow for a more 
detailed examination of differences across subpopulations 
within the broader context of individuals who are DHH. Take, 
for example, a group design approach to looking at the effect of 
an intervention with students from a variety of home language 
contexts. A typical grouping might be to divide participants into 
three groups: families with at least one parent who is fluent in 
sign language, families with parents who learn sign while the 
child is growing up, and families with parents who do not use 
sign at all. This would give three groups to use in analysis as 
either an independent variable (IV) or a covariate. In an SCD 
approach, instead of forcing a group delineation, it would be 
possible to meaningfully utilize multiple variables that form a 
robust picture of the participants home language context. Each 
individual case would have a profile that could include a greater 
number of relevant variables such as the sign language profi- 
ciency of both parents (where applicable), siblings’ use of sign 
and hearing status, access to language models outside of the 
home, early intervention resources, and so forth. The individual 
participant’s full profile would be displayed alongside the base- 
line and response to interventions. In comparison with a group 
design, the SCD approach provides a greater context for inter- 
preting the data related to outcome data. 

Although there are multiple variations of the SCD, each type 
shares some common characteristics. The foundational require- 
ment of SCD is related to the staggered application of the inter- 
vention across participants, and the use of repeated measures 
of outcome variables across cases. This means that whenever 
treatment is applied to one case (e.g., a participant, setting, 
or behavior) and not to another, there exists a comparison 
between the treatment and no-treatment group (Kazdin, 2011). 
For a visual depiction of staggered treatment application, see 


Figure 1. Staggering the introduction of an intervention across 
cases allows for more stringent analysis of outcomes among 
different participants, behaviors, or settings. More specifically, 
gathering multiple data measurements allows researchers to 
closely examine trends in performance during the baseline and 
intervention conditions (referred to as “phases”), and allows 
researchers to rule out competing explanations for changes 
in the dependent variable between baseline and intervention 
phases (Kelly, 1998). 

During the baseline phase, or the “A phase,” data is collected 
on a participant's performance prior to the introduction of the 
intervention. The purpose of the baseline phase is to provide a 
basis of comparison between the treatment and no-treatment 
conditions (Kazdin, 2011). Researchers can determine the like- 
lihood that an individual’s performance will remain stable 
in the absence of an intervention by examining data patterns 
from baseline measurements. Conclusions can be drawn about 
the efficacy of an intervention when there are changes to the 
data pattern during the intervention phase, or “B phase,” rela- 
tive to baseline measurements. Additionally, data collected 
from the baseline phase allows researchers to observe an indi- 
vidual’s performance or behavior in the absence of intervention. 
This type of descriptive analysis is useful for determining the 
extent of the problem and to confirm that the intervention and 
intended outcomes are well suited for the participant. One limi- 
tation to measuring participant outcomes over time is known 
as the threat of maturation, which refers to the natural changes 
that can occur in participants over time. Changes due to matu- 
ration threaten internal validity if they could have produced the 
outcome that would otherwise be attributed to the intervention 
(Shadish et al., 2002). Threats of maturation can be minimized 
by ensuring that participant groups are similar across variables 
that are susceptible to changes across time (i.e., age, geographic 
location) (Murray, 1998). 

A commonly used variation of the SCD in deaf studies 
and deaf education research is the multiple-baseline design. 
Common to all types of multiple-baseline designs, treatment 
effects are demonstrated by systematically introducing an 
intervention at different points in time across different base- 
lines (i.e., participants, settings, or behaviors), and causal rela- 
tionships between two or more variables are determined by 


Baseline 


Intervention 



Figure 1. Example diagram of a single-case, multiple-baseline design across participants. 
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comparing data between intervention and baseline phases. The 
strength of documented intervention effects using a multiple- 
baseline design is dependent on the nonconcurrent application 
of the IV across cases. 

The multiple-baseline design is often used with interven- 
tions that are intended to generate outcomes that are not 
reversible, such as literacy or communication interventions 
(Barger-Anderson et al., 2004). The multiple-baseline design 
allows investigators to determine both the immediate and gen- 
eralized effects of a treatment across multiple settings, par- 
ticipants, or behaviors. SCD is a suitable methodology across 
a number of different research scenarios, such as communica- 
tion, behavioral, or social skills interventions. When investigat- 
ing participant outcomes related to communication skills, for 
example, it is important to assess the persistence and gener- 
alization of those skills across time and different settings (e.g., 
work, school, and home). SCD calls for continuous or periodic 
monitoring of skill generalization during intervention and 
maintenance phases, which allows investigators to determine 
how much of a particular treatment is needed to produce skill 
generalization across settings. Furthermore, because interven- 
tions for individuals who are DHH are often conducted in one 
type of setting, such as a single classroom or school, assess- 
ing the generality of outcomes outside of the direct training 
environment is necessary to determine the efficacy of the 
intervention. 

In the multiple-baseline design, intervention data can be 
collected continuously throughout the experiment, or periodi- 
cally, after a stable baseline is reached (Horner & Baer, 1978). 
Continuous (as opposed to periodic) data collection with the 
multiple-baseline design is a desirable approach because it pro- 
vides a more complete representation of the effects of an inter- 
vention, especially with regard to the generality of outcomes. 
A significant advantage to the multiple-baseline design is that 
periodic data collection can be used strategically to demonstrate 
both the immediate and persisting effects of an intervention 
(Kaiser, 2014). A major drawback to using continuous data col- 
lection is that it is often very time-consuming and costly, espe- 
cially when there are multiple behaviors or settings of interest 
or if data is collected across multiple contexts. 

A variation on the multiple-baseline design that uses periodic 
data collection is known as the multiple-probe design (Horner & 
Baer, 1978; Kazdin, 2011; Richards, Taylor, & Ramasamy, 2014). 
Periodic data collection is especially efficient for assessing the 
generalizability of outcomes, or the persistence of effects after 
the intervention has ended, as is usually desired in behavioral, 
communication, or literacy interventions. Because many inter- 
ventions that are developed for the DHH population are utilized 
within these domains, the multiple-baseline design can be con- 
sidered an exceptionally compatible approach for experimental 
research in deaf studies and deaf education. 

Figure 1 illustrates the use of a multiple-baseline design 
across participants for a hypothetical intervention study target- 
ing expressive communication in DHH participants. The graph 
demonstrates the effects of a cued speech intervention (IV) on 
the percent of correctly expressed vocabulary words (dependent 
variable) for each participant. Consider the data for the first par- 
ticipant represented in this figure, Rachel. The five data points 
collected during the baseline phase reflect her performance in 
the absence of the intervention. This data provides a basis of 
comparison for her performance after the introduction of the 
intervention. Notice the intervention phase for the next partici- 
pant, Sarah, began at session 10. Sarah’s baseline data for ses- 
sions six through 10 serves as a control condition for the initial 


sessions of Rachel’s intervention phase. Another control con- 
dition occurs for Jackie’s case, during sessions 11 through IS. 
Because there was a change in each participant’s performance 
after the introduction of the intervention, but no changes to 
the baseline data during each participant’s control condition, 
we can more confidently attribute the observed changes to the 
intervention. 

The Current Study 

Due to the limited number of identified evidence-based prac- 
tices that have been linked to positive outcomes in DHH individ- 
uals (Easterbrooks, 2010), it is especially important to conduct 
and evaluate empirical research with consistent and rigorous 
standards of methodological quality. The purpose of the pre- 
sent analysis was to answer the following questions: (a) To what 
extent is published SCD research with DHH individuals aligned 
to the standards established by the WWC? (b) What are the spe- 
cific methodological strengths and weaknesses of the studies 
within the SCD literature with individuals who are DHH? 

Method 

Article Search Strategy 

The initial article search was conducted using the publications 
databases ERIC, PsycINFO, PsycARTICLES, Academic Search 
Complete, and Google Scholar. Manual searches were also 
used within the Journal for Deaf Studies and Deaf Education. 
The following key words were included in the search criteria: 
DeaT, hard-of-hearing, hearing impaired, hearing loss, single- 
case, single-subject, intervention. These search terms do not 
assume to cover the full range of characteristics, identities, 
and terminology that are associated with individuals who 
are DHH; they are instead vocabulary and key words that are 
commonly used in the research literature. The search yielded 
768 articles, dissertations, book chapters, and other literature 
that was potentially suitable for our analysis based on arti- 
cle abstracts. Publication titles and abstracts were screened to 
determine which studies should be considered for inclusion. 
After the initial screening, 97 studies were identified for fur- 
ther review. 

Each publication was screened for inclusion eligibility with 
following criteria: (a) participants were identified as DHH (e.g., 
deaf, hearing impaired, with a hearing loss, late-deafened, hard- 
of-hearing) or DHH with an additional disability; (b) the research 
design was a variation of SCD that demonstrated experimental 
control and replication as outlined by the WWC (i.e., multiple- 
baseline or multiple-probe, reversal, alternating treatments, 
changing criterion); (c) used individual level unit of analysis 
(i.e., not at a classroom level); (d) the study had to present the 
data graphically to allow for visual analysis; and (e) published 
in a peer-reviewed journal. The decision to only include peer- 
reviewed research was because these articles are most likely to 
be used in summaries of evidence in support of specific inter- 
ventions. Of the 97 publications that were initially retrieved, 12 
were eligible for inclusion in this analysis. A diagram of the lit- 
erature search process is provided in Figure 2 and a summary of 
included articles is provided in Table 1. 

The WWC Standards for SCD Research 

The specific criteria used to assess each study are based 
on the most recent pilot version of WWC single-case 
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Figure 2. Single-case design article inclusion flowchart. 


intervention research design standards, which are documented 
in the Procedures and Standards Handbook , version 3.0 (Kratochwill 
et al., 2010/2014). WWC was established in 2002 by the Institute for 
Education Sciences to provide evaluations of the quality of stud- 
ies that test the effects of educational and psychological inter- 
ventions. Although the WWC initially focused on the analysis of 
group-based methodologies, the Procedures and Standards Handbook 
now proposes guidelines for designing and evaluating a research 
study using SCD. The guidelines for evaluating the SCD research 
include protocol for determining the strength of the research 
design to ensure sufficient internal validity, and the strength of 
the evidence through visual analysis of data (Maggin, Chafouleas, 
Goddard, & Johnson, 2011). This protocol will be summarized in 
the following sections, 1 2 followed by a description of the quality 
and analysis of evidence conducted with the present sample. 

Assessment of design quality 

The first step in quality assessment was an evaluation of the 
studies’ design properties, which yielded quality rankings of 
either (a) Meets WWC Pilot SCD Standards without Reservations, 
(b) Meets WWC Pilot SCD Standards with Reservations, or (c) 
Does Not Meet WWC Pilot SCD Standards. This process is sum- 
marized in Figure 3. The four criteria for WWC design quality 
(described below) were assessed on a dichotomous scale (yes or 
no). In order to meet design standards without reservations, all 
of the following criteria must be met: 

1. Independent variable: The investigators must systemati- 
cally manipulate the IVs. In other words, the researchers 
must determine how and when the IV is introduced. 

2. Inter-observer agreement: Each dependent variable must 
be measured systematically over time by no less than two 
investigators, using an accepted measure of inter-observer 
agreement (IOA) to determine agreement for each baseline 
across each dependent variable. Documentation of IOA 
must be reported for each phase, for no less than 20% of 
the points within each phase. Agreement must be no less 
than 0.80 if measured by percentage agreement, and 0.60 if 
using Cohen’s kappa. 

3. Attempts to demonstrate effects over time: A study must 
include at least three attempts to demonstrate the effects 


of the IV at different points in time. 3 Intervention data 
that overlap across cases do not represent an attempt to 
demonstrate an effect. Further clarification on this criteria 
include the following: 

• Some commonly used SCDs do not meet this standard, 
including AB, ABA, and BAB designs. 

• Multiple-baseline designs must have at least three baseline 
phases, and at least three intervention phases (three A and 
three B phases). 

• For alternating and simultaneous designs, five demonstra- 
tions of an effect are required. 

4. Data points per phase: In order to constitute an attempt 
to demonstrate an effect, studies need to meet criteria 
with regard to the number of data points per phase. There 
is some flexibility in the application of these standards, 
depending on the unique properties of the research pop- 
ulation and intervention. The principal investigator must 
provide strong justification for applying alternate criteria. 
Further clarification includes the following: 

• For reversal/withdrawal designs (e.g., ABAB) to meet design 
standards without reservations, there must be at least two 
cases with four phases per case, with a minimum of five data 
points per phase. In order to meet design standards with res- 
ervations, phases must have at least three data points per 
phase. Phases with fewer than three data points are not suf- 
ficient to demonstrate an effect. 

• For multiple-baseline and multiple-probe designs, there 
must be a minimum of six phases with at least five data 
points per phase to meet design standards without reserva- 
tions. To meet standards with reservations, there must be a 
minimum of six phases with at least three data points per 
phase. 

• For alternating treatment designs, there must be a mini- 
mum of five data points per condition and two data points 
per phase at most to meet design standards without reser- 
vations. To meet design standards with reservations, there 
must be four data points per condition and two data points 
per phase. Phases with more than two data points cannot 
be used to demonstrate an effect because the design calls 
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Table 1. Analyzed studies 


Hearing status 


Study authors n Age Degree of loss Amplification Additional disability Intervention 


Outcomes 


Allgood, Heller, 

5 

17-20 

Moderate, 

Easterbrooks, and 



profound 

Fredrick (2009) 




Heller, Allgood, 

4 

18-19 

Moderate, 

Ware, Arnold, and 



profound 

Castelle (1996) 




Beal-Alvarez et al. 

1 

5 

Moderate 

(2012) 





NR 

NR 


HA 


S4 Beal-Alvarez et al. 
(2012) 


3 4-5 Moderate, 

severe 


HA 


S5 Bergeron et al. (2009) 5 3-7 Moderate, Cl, HA 

severe 


S6 Bergeron et al. (2009) 5 3-4 NR Cl 


57 Cannon, Fredrick, 

and Easterbrooks 
(2009) 

58 Cohen, Allgood, 

Heller, and 
Castelle (2001) 

59 Miller, Lederberg, 

and Easterbrooks 
(2012) 


4 10-12 NR NR 


3 17-20 Moderate, NR 

severe, 
profound 

5 3-5 Moderate, Cl, HA 

severe 


S10 Mueller and Hurtig 
(2010) 


2-4 Mild, moder- Cl, HA, un- 
ate, severe, aided 
profound 


Sll Neef and Iwata 
(1985) 


2 23-26 NR 


NR 


S12 Van Hasselt, Hersen, 2 21 
Egan, Mckelvey, 
and Sisson (1989) 


Moderate, 


NR 


Mild ID, moderate ID 

Instructional: Picture 

Functional 


dictionaries 

communication in 
vocational setting 

Mild ID, moderate ID, 

Instructional: Dual 

Functional 

visual impairment 

communication 

communication in 


boards 

vocational setting 

NR 

Instructional: 

Grapheme-phoneme 


Emergent literacy 

correspondence 


curriculum 
combined with visu- 
al phonics adminis- 
tered individually 

acquisition 

NR 

Instructional: 

Grapheme-phoneme 


Emergent literacy 

correspondence 


curriculum 
combined with 
visual phonics 
administered in a 
small group 

acquisition 

NR 

Instructional: 

Knowledge of 


literacy curriculum 

correspondence 


from children’s early 

between alpha- 


intervention (CEI) 

betic phonemes and 


program 

graphemes 

NR 

Instructional: 

Knowledge of 


Component of foun- 

correspondence 


dations for literacy 

between alpha- 


program 

betic phonemes and 
graphemes 

NR 

Instructional: DVD 
expository books 

Vocabulary words 

Borderline ID, mild ID 

Instructional: Picture 

Functional 


dictionaries 

communication in 
vocational setting 

None 

Instructional: 

Phonemic and 


Component of foun- 

phonological aware- 


dations for literacy 
program 

ness 

None 

Instructional: 

Time on-task during 


Reading program 

reading session; 


from The Iowa Sign- 

acquired sign 


ing E-book 

vocabulary 

None 

Instructional: Sequen- 

Acquisition of lip read- 


tial cued speech 

ing skills; expressive 


training 

articulation 

responses 

Blindness, cerebral 

Behavioral 

Increase of on-task 

palsy, severe ID, 

reinforcement 

behaviors; 

seizure disorder, 

training 

appropriate social 

cardiac disease 


interaction; 
decrease of 
self-injury, 
disruption, and 
stereotypy 


Note. Cl, cochlear implant; HA, hearing aid; NR, not reported. 


for fast alternations between phases. Designs that attempt to 
compare multiple interventions are rated individually. 

Studies that used a multiple -probe design were assessed with 
additional criteria, outlined below. Multiple -probe studies that 


did not meet all of the following criteria WWC received a rating 
of Does Not Meet WWC Pilot SCD Standards. 

1. Initial baseline sessions must overlap vertically. The design 
must include three consecutive probe points for each 
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Figure 3. WWC quality rating criteria for SCD studies. SCD, single-case design; WWC, What Works Clearinghouse. 


case within the first three baseline sessions. Studies with 
at least one probe point for each case within three initial 
baseline sessions Meet Pilot Standards with Reservations. 

2. Probe data must be available in the sessions immediately 
prior to the introduction of the intervention. Within the 
three sessions just prior to the introduction of the inter- 
vention, there must be three consecutive probe points for 
each case. At least one probe point within the three ses- 
sions before the intervention phase is required to Meet 
Pilot Standards with Reservations. 

3. For each case that is not currently receiving the interven- 
tion, there must be a probe data point during a session 
where another case either (a) first receives the interven- 
tion, or (b) reaches the predetermined intervention crite- 
rion level. This point must be consistent in level and trend 
with the case’s previous baseline points. 

Steps for visual analysis 

The next step was to apply the WWC protocol for visual analysis 
of data 4 to studies that meet design standards with or without 
reservations. Recall that the rationale for using visual analysis 
of data in SCD studies is that observed changes in the depend- 
ent variable can be associated with manipulation of the IV, 
which is based on the observation of predicted and replicated 
data across cases. Visual analysis is used to determine whether 
the data demonstrated at least three indications of an effect at 
different points throughout the study. 

Quality of effect demonstrations will be assessed to deter- 
mine whether the study provides strong evidence, moderate 


evidence, or no evidence of a causal relationship. An interven- 
tion effect is demonstrated by documenting six experimental 
properties. The first three properties should be measured within 
phases, while the last three refer to properties of the data pat- 
terns across phases: 

1. Level: The mean score of the outcome measures within a 
phase. 

2. Trend: The slope of the best fitting line for the data within 
a phase. 

3. Variability: The degree of overall scatter around the line 
of best fit. Generally speaking, increased data variability 
detracts from the strength of the relationship between vari- 
ables. 

4. Immediacy of the effect: Refers to the change in level 
between the last three data points in one phase, and the 
first three data points of the next. An immediate change in 
an effect between phases provides a stronger indication of 
an intervention effect. 

5. Overlap: The proportion of data from one phase that 
overlaps with data from the previous phase. In general, a 
smaller proportion of overlapping data points leads to a 
stronger demonstration of the intervention effect. 

6. Consistency of data in similar phases: Involves looking at 
the data from all phases within the same condition (e.g., 
baseline vs. intervention) and investigating the consist- 
ency in the data patterns from phases with the same con- 
ditions. Increased consistency is an indication of a causal 
relationship between the independent and dependent 
variables. 
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If a SCD study provided three demonstrations of an intervention 
effect, as determined through visual analysis of the above proper- 
ties by at least two trained observers, then the study was consid- 
ered to demonstrate strong evidence of a causal relationship. If a 
study does not provide three demonstrations of an effect at differ- 
ent points in time, then the study is rated as no evidence. If a study 
provides three demonstrations of an effect, plus one demonstra- 
tion of a noneffect, then the study is considered to provide moder- 
ate evidence of an intervention effect. A noneffect is demonstrated 
with the following criteria: (a) data in the baseline phase are not 
stable enough to provide a meaningful comparison with data dur- 
ing the intervention phases; (b) high variability throughout phases, 
or inconsistent data trends across similar intervention phases; (c) 
a long latency between the introduction of an intervention and an 
observed effect; (d) overlap of outcome effects between baseline 
and intervention phases; or (e) inconsistency in outcomes between 
different cases (e.g., subjects, behaviors, or settings). 

Results 

The following section describes an analysis of design qual- 
ity and strength of evidence of SCD intervention studies that 
include DHH participants utilizing the WWC standards. Table 2 
summarizes the results of the design quality analysis for the 
included articles in this analysis. 

Coder Reliability for Design Standards 

The first three authors coded each article utilizing the WWC 
design quality standards. Each coder evaluated the design qual- 
ity of each study across the individual criterion levels, which 
led to one of three comprehensive ratings in accordance with 
the WWC design standards: (a) Meets WWC Pilot SCD Standards 
without Reservations; (b) Meets WWC Pilot SCD Standards with 
Reservations; (c) Does not Meet WWC Pilot SCD Standards. After 
an initial training period, IOA in rating the sample of SCD stud- 
ies was 100% across all criteria. 

Design Quality Outcomes 

The majority of studies within the sample (n = 8 out of 12) did 
not meet the standards for WWC design quality. Across these 


studies, there were several weaknesses that related to appli- 
cation of SCD methodology, experimental validity, and overall 
design rigor. The weaknesses that compromised design quality 
fell under three categories, including (a) failure to meet IOA cri- 
teria; (b) insufficient demonstration of intervention effects; or (c) 
inadequate experimental control. According to the WWC frame- 
work, studies which fail to meet any one of the design quality 
criteria does not meet standards. 5 

IOA criteria 

Among the studies that did not meet design standards, a com- 
mon flaw was the failure to meet IOA criteria, with three stud- 
ies not fulfilling these criteria. Failure to meet quality standards 
under IOA criteria was a result of insufficient percent agreement 
between observers (less than 0.80), or because the investigators 
did not measure or report IOA. For studies that reported and IOA 
above 0.80, it was often the case that the investigators did not 
report the number of sessions or cases for which an IOA meas- 
ure was taken. Insufficient reporting of IOA resulted in ques- 
tionable validity of the studies’ data, which detracted from the 
strength of the reported effects. 

Insufficient demonstration of intervention effects and 
inadequate experimental control 

Three studies had several methodological flaws that compro- 
mised the integrity of the effect demonstration. An uncommon 
but significant design issue resulted from a failure to manipu- 
late IVs under controlled conditions, which resulted in compro- 
mised experimental integrity. For example, one study did not 
administer the IV in an experimentally controlled condition 
(i.e., intervention was administered in the participants’ homes 
and not supervised by a trained investigator or with checks 
for fidelity). Another design weakness was resulted from mul- 
tiple intervention phases with fewer than three data points, 
and insufficient opportunities for comparison of outcome data 
across phases, baselines, and participants. Studies that did not 
meet standards under this category also failed to meet stand- 
ards for adequate attempts to demonstrate an effect. However, 
the presence of a phase with fewer than three data points was 
not an automatic disqualifier for a study. If the remaining phases 
with three or more data points were sufficient to demonstrate 


Table 2. Results of design quality analysis 


Study 

Manipulation 
of IV 

Three attempts to 
demonstrate an effect 

Three data points 
in each phase 

IOA 

Criteria for 
multiple probe 

Design rating 

SI 

Yes 

Yes 

Yes 

Yes 

Yes w/reservations 1 

Meets Pilot Design Standards 
with Reservations 

S2 

Yes 

Yes 

Yes 

Yes 

Yes w/reservations 1 

Meets Pilot Design Standards 
with Reservations 

S3 

Yes 

Yes 

No 3 

Yes 

N 0 1 . 3 . 6 

Does not Meet Pilot Design Standards 

S4 

Yes 

Yes 

Yes 

Yes 

No 6 

Does not Meet Pilot Design Standards 

S5 

Yes 

Yes 

Yes 

No 2 

No 5 

Does not Meet Pilot Design Standards 

S6 

Yes 

Yes 

Yes 

No 2 

No 5 ' 6 

Does not Meet Pilot Design Standards 

S7 

Yes 

No 3,6 

No 3 

Yes 

No 1,6 

Does not Meet Pilot Design Standards 

S8 

Yes 

Yes 

Yes 

Yes 

Yes w/reservations 1 

Meets Pilot Design Standards 

S9 

Yes 

No 6 

No 3 

Yes 

No 1 ' 5 . 6 

Does not Meet Pilot Design Standards 

S10 

No 4 

Yes 

No 3 

No 2 

Not applicable 

Does not Meet Pilot Design Standards 

Sll 

Yes 

Yes 

No 3 

Yes 

Not applicable 

Does not Meet Pilot Design Standards 

S12 

Yes 

Yes 

Yes 

Yes 

Not applicable 

Meets Pilot Design Standards 


Note. 1 = Fewer than three probe points at the start of the baseline phase for each case, 2 = No IOA data gathered or reported, 3 = Baseline and/or experimental 
phase(s) with less than three total data points, 4 = Inadequate experimental control, 5 = Insufficient data points just prior to introduction of IV, 6 = Probe data missing 
or not vertically aligned according to multiple-probe standards. IOA, inter-observer agreement; IV, independent variable. 


E. Wendel et a 1. | 9 


three effects, then the study was still eligible to meet standards 
in this domain. 

Another issue under this category related to insufficient 
data, specifically in the baseline phases. Insufficient baseline 
data within and across baseline phases detracts from the inter- 
nal validity of a multiple-baseline design, as adequate baseline 
data is necessary for effect demonstration across all variations 
of SCDs. Periodic data collection is not always an indicator of 
insufficient baseline data, as adequate baseline data is a feature 
of well-constructed multiple-probe designs, which was a com- 
monly used variation of SCD within our sample. However, this 
flaw was most prevalent across studies using the multiple-probe 
design. 

Other methodological issues 

Another design problem within our sample resulted from insuf- 
ficient number and placement of data points within studies 
that used the multiple-probe design. Because multiple-probe 
designs use periodic data collection, they are generally more 
cost efficient than designs that utilize continuous data collec- 
tion. However, the cost for increased resource efficiency is the 
potential reduction of methodological rigor. Periodic data col- 
lection can detract from the strength of a demonstrated effect, 
especially if there is high variability in the data patterns. For 
example, if probe data is inconsistent or unexpected given the 
time sequencing of treatment application (i.e., if a behavior is 
observed that is unlikely to occur without prior training), then 
it would be more difficult to attribute the observed effects to 
the intervention, especially if the design was not executed with 
methodological rigor. 

Results of visual analysis 

Following the analysis of design quality, the studies that were 
determined to meet design standards with or without reserva- 
tion were evaluated for strength of evidence using the guide- 
lines for visual analysis outlined above. Out of the four studies 
that met design standards with or without reservations, all were 
determined to have moderate to strong evidence. (Moderate evi- 
dence: Van Hasselt et al., 1989, strong evidence: Allgood et al., 
2009; Cohen et al., 2001; Heller et al., 1996.) The study that was 
rated as providing moderate evidence (Van Hasselt et al., 1989) 
had one documented noneffect, which resulted from a high 
level of variability in the data in one of the phases. 

Discussion 

In order to identify evidence-based practices for students 
who are DHH, researchers are called to produce more strin- 
gent study designs that are sensitive to the low-incidence 
and heterogeneity of the DHH population. The rising atten- 
tion surrounding the SCD as a mechanism for drawing causal 
inferences about the effectiveness of interventions represents 
a promising shift in the field and includes a call to define crite- 
ria for structurally sound and methodologically rigorous SCD 
research. 

The purpose of this article was to apply the WWC standards 
for SCD studies in a review of the existing body of DHH research 
in order to determine the present state of the evidence using this 
methodological approach. At present, there are an insufficient 
number of studies in any one domain that meet the standards of 
SCD design quality to constitute an evidence-based practice. We 
now discuss the challenges of applying the WWC design evalu- 
ation protocol to DHH research, as well as the limitations to the 
present analysis and implications for future research. 


Limitations 

There are several limitations to consider in the present analysis. 
First, although a thorough literature search was used to identify 
studies for inclusion in the present analysis, it is possible that 
other potentially relevant articles were unidentified, particularly 
unpublished dissertation studies that did not undergo external 
peer review. An additional limitation is due in part to the WWC 
standards being a new tool for evaluating SCD research. The 
standards were developed to bring increased consistency and 
objectivity to the evaluation of SCD research; however, the cur- 
rent standards do not take specific populations or applied con- 
texts into account, an issue discussed in further detail below. 

Development of the standards for SCD studies provides the 
first explicit definition of methodological quality in the history 
of SCD research (Kratochwill & Levin, 2014). The present analysis 
was conducted within an emerging area of inquiry, which on 
a field-wide level, is presently in a state of flux. Consequently, 
it is important to acknowledge the inherent limitations to 
using a newly developed set of criteria to assess existing work. 
Challenges to using the standards that apply specifically to DHH 
research include population neutrality (i.e., uniform application 
of guidelines across diverse populations) and unclear guidelines 
on areas for flexibility in their application. It would be impracti- 
cal to create individualized standards for every population, yet 
it is also unreasonable to have all users of SCD designs adhere 
to a rigid, one-size-fits all set of standards for research design 
across multiple populations of interest and research questions. 
Although the present WWC standards for SCD purport to offer 
flexibility in the application of the standards for certain criteria, 
there are no explicit guidelines for when to exercise this flexibil- 
ity when using the standards to evaluate a study for its adher- 
ence to them. 

An additional challenge to applying the WWC standards is 
in the lack of evaluation criteria for reporting participant char- 
acteristics and demographic information. Although it is not 
the responsibility of WWC to know the specific details of rel- 
evant demographic information across all psychological and 
educational research, it would be helpful for the guidelines to 
include this category as a place where each domain reflects on 
the inclusion or exclusion of attributes that may be theoretically 
relevant to the intervention and inferences about its efficacy. In 
the case of DHH research, it is especially important to consider 
participant variables and research contexts that are shaped by a 
unique set of participant characteristics (such as age of cochlear 
implantation or presence of a disability) and circumstances sur- 
rounding the practical application of experimental conditions 
(such as interventions administered in individual vs. group 
settings). Operationalized descriptions of participant charac- 
teristics are necessary for not only DHH populations, but also 
for other heterogeneous populations within special education 
research. Because there were no specific guidelines within the 
standards for evaluating the quality of demographic reporting, 
this article lacks a systematic evaluation of such variables. 

One final recommendation for future versions of the WWC 
standards for SCD research is the need for specific guidelines 
for the technical application of SCD principles (e.g., instructions 
for timing the introduction of an intervention across baselines 
in the multiple-baseline design). More specifically, the standards 
do not thoroughly address the need for appropriate sequenc- 
ing of introduction of the intervention for multiple-baseline 
and multiple-probe designs. In the current version of the stand- 
ards, the criteria states, "multiple-baseline and multiple-probe 
designs implicitly require some degree of concurrence in the 
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timing of their implementation across cases when the inter- 
vention is being introduced. Otherwise, these designs cannot be 
distinguished from a series of separate AB designs” (Kratochwill 
& Levin, 2014, p. E.4). This statement touches on an important 
component of SCD methodology but does not provide enough 
guidance to execute a multiple-baseline or multiple-probe 
design with solid internal validity. Therefore, it is important that 
future versions of the standards incorporate more rigorous and 
thorough guidelines in this area. 

Implications for Future Research 

The WWC Standards for SCD research provides a methodologi- 
cal framework that can potentially increase the consistency of 
rigorous SCD application, which in turn will hopefully promote 
identification of evidence-based practices and interventions for 
DHH and other low-incidence populations. Considering that the 
articles reviewed in the present study were conducted prior to 
the development of the WWC standards for SCD, it would be 
counterproductive to discount these studies based on imperfect 
methodology, particularly in a field where the evidence base for 
interventions and practices are still gaining capacity. One of the 
implications of this analysis is that scholars in deaf studies and 
deaf education may need to engage in a dialog as to where the 
SCD standards are applicable (and to what degree), and places 
where the unique context of the field may call for flexibility in 
how the WWC SCD standards are applied. 

Within the field of applied research with individuals who 
are DHH, interventions are developed for a highly diverse group 
of individuals, for which group-based research methodologies 
are sometimes not appropriate. Consequently, the development 
of a consistent, documented methodology for evaluating SCD 
studies will likely lead to an increase in the identification and 
propagation of high-quality, evidence-based practices among 
traditionally underserved populations, including individuals 
who are DHH (Beal-Alvarez & Cannon, 2014; Maggin, Briesch, 
& Chafouleas, 2013). The present review serves as an indica- 
tor of the current state of single-case methodology applied to 
deafness research, and offers direction for future SCD research 
in the field. Ultimately, the continued use and refinement of 
the WWC method of quality and evidence analysis, tied with 
strong theoretical foundation and sufficient sampling across 
the broad range of individuals in this population, should lead to 
a stronger evidence base of interventions and practices within 
DHH research. 

Conclusion 

Although the present sample of SCD studies does not provide 
sufficient empirical evidence to identify an evidence-based 
practice in any one content area on their own merit, these arti- 
cles undoubtedly contribute to the development of evidence- 
based interventions and practices designed for individuals who 
are DHH, summarized across all methodological approaches. To 
assess this body of SCD work without considering experimental 
context would be problematic and unduly critical. Furthermore, 
it is necessary for authors to report (and reviewers to consider) 
relevant covariates such as age of onset or language develop- 
ment in order for the significance of the study findings and 
applications to specific subpopulations to be adequately con- 
sidered. SCDs hold the same responsibilities as other meth- 
odologies in clearly establishing the connections between the 
intervention and theoretical rationale for its efficacy with indi- 
viduals who are DHH. We acknowledge that the studies in our 


sample represent the foundation of an important branch of 
experimental research on issues related to interventions with 
individuals who are DHH, and that their findings still have sig- 
nificant impact on and application in the field. 

Although there are limitations to using the WWC stand- 
ards to evaluate the quality of SCD research in deaf studies 
and deaf education, the current standards do provide a solid 
framework for guiding future SCD research in the field. It is 
our hope that this article will support a thoughtful critique of 
SCD use in its current form, and their use in interpreting the 
effects of interventions for individuals who are DHH. We also 
encourage investigators to use the WWC standards as a plat- 
form for discussing the validity of research findings using SCDs 
and ways in our field might apply them to improve the strength 
of inferences about psychological and educational research in 
our field. 
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Notes 

1. Many different terminologies for individuals who are DHH 
are used across the research literature. Studies that focus 
on the cultural, linguistic, and societal aspects of the Deaf 
community, including use of sign language, group member- 
ship, children of Deaf adults, and so forth, will identify par- 
ticipants as “Deaf,” with a capital D (Padden & Humphries, 
1988). Either in conjunction with the above or in contrast, 
other studies refer to an individual’s “hearing loss” or as 
being “hearing-impaired,” either in terms of decibel loss, a 
range of loss such as “moderate” or “profound,” whether it 
is bilateral or in one ear, and in some cases, in terms of com- 
munication in their environment (e.g., has trouble hearing 
people on the telephone). Stemming from this audiologi- 
cal, medical perspective, “deaf” and “hard-of-hearing” are 
commonly used terms to describe study participants in a 
range of study contexts. Yet individual choices about com- 
munication modality and identities may vary not only 
between individuals, but also between different contexts 
and settings for an individual person (Harris, Holmes, & 
Mertens, 2009; Stanley, Ridley, Harris, & Manthorpe, 2011). 
In this paper we keep the terminology as used in the origi- 
nal research article, unless otherwise stated. 

2. The WWC guidelines for evaluating SCD research are pro- 
vided to illustrate the review process used by the authors of 
the present article and are not intended to be instructional. 
For a more comprehensive description of the article review 
process, readers are advised to refer to the complete WWC 
SCD standards handbook, available from http://ies.ed.gov/ 
ncee/wwc/pdf/wwc_scd.pdf 

3. Based on the discretion of the principal investigators, it is 
sometimes acceptable for a research design to demonstrate 
two replications, rather than three. Reviewers must provide 
an explanation as to why less than three demonstrations 
provide sufficient evidence of an intervention effect. 

4. There are no universally accepted conventions for visual 
analysis. The steps for visual analysis described in the pre- 
sent article are from the WWC Handbook for evaluating 
SCD research and are provided in an abbreviated format. 
To access the full version of the steps for visual analysis by 
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the WWC, please refer to the complete Handbook: http:// 
ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf 
5. The current standards allow the principal investigator to 
apply exceptions to the standards, based on the nature of 
the intervention, outcome variable(s), and population of 
interest. Exceptions to the standards must be justified and 
specified in the review protocol. 
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